Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve bashlog.sh performance #313

Merged
merged 8 commits into from
Jul 7, 2022
Merged

Improve bashlog.sh performance #313

merged 8 commits into from
Jul 7, 2022

Conversation

reegnz
Copy link
Contributor

@reegnz reegnz commented Dec 14, 2021

Performance is improved by an order of magnitude by ensuring that
logging does not fork unnecessarily:

  • forking to date only happens if timestamps are actually used
  • forking to basename is replaced with bash expansion
  • forking to dirname is replaced with bash expansion

Fixes #196

Performance is improved by an order of magnitude by ensuring that
logging does not fork unnecessarily:
* forking to `date` only happens if timestamps are actually used
* forking to basename is replaced with bash expansion
* uppercasing is repaced with bash expansion

Fixes #196
@reegnz
Copy link
Contributor Author

reegnz commented Dec 14, 2021

Benchmarks

Before

time ./terraform -version
Terraform v1.1.0
on darwin_amd64
./terraform -version  0.44s user 0.48s system 62% cpu 1.489 total

After

time ./terraform -version
Terraform v1.1.0
on darwin_amd64
./terraform -version  0.07s user 0.04s system 78% cpu 0.147 total

I made sure that the log function stays backward compatible, so there shouldn't be any breaking changes.

@reegnz
Copy link
Contributor Author

reegnz commented Jan 11, 2022

@Zordrak could you please have a look?

lib/bashlog.sh Outdated
local facility="${BASHLOG_SYSLOG_FACILITY:-local0}";
local pid="${$}";

local level="${1}";
local upper="$(echo "${level}" | awk '{print toupper($0)}')";
local upper="${level^^}";
Copy link

@pinkplus pinkplus Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to work on Mac. ${level^^} is introduced in Bash 4, while Mac by default only has Bash 3. Maybe use tr instead?

local upper="$(tr "[:lower:]" "[:upper:]" <<< "$level")";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinkplus, problem is, the performance issues of the whole bashlog.sh were because it's shelling out to external commands...

Copy link
Contributor Author

@reegnz reegnz Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a reasonable compromise to expect people on macos to have a newer bash installed?
I'd expect devs to have a newer bash on macos instead of being stuck in the past.

The scripts take whatever bash is on your path (#!/usr/bin/env bash).
Both macports and homebrew can be used to install a newer bash version on MacOS.

Disclaimer: I'm also on MacOS, but the old built-in bash version never limited me, I always use bash from homebrew and expect bash scripts to have #!/usr/bin/env bash as a shebang instead of #!/bin/bash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pinkplus I tested your suggestion, and it definitely has a perf impact:

time ./bin/terraform -version
Terraform v1.1.5
on darwin_amd64
./bin/terraform -version  0.16s user 0.17s system 69% cpu 0.474 total

With the bash feature:

time ./bin/terraform -version
Terraform v1.1.5
on darwin_amd64
./bin/terraform -version  0.08s user 0.05s system 83% cpu 0.158 total

I think that's a serious perf degradation and the goal of this PR is to improve perf.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the point is to use bash parameter expansion and avoid processes. But this is a problem for Mac users. Mac only bundles bash 3.2. What's more, the default shell was changed to zsh several years ago. I would say it's not normal to expect bash 4, because users generally don't need it.

Replacing one call shouldn't make tfenv much slower. Even with tr it's already much faster than the current implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To save a few more cycles -- not sure how important it is -- can probably conditionally declare one of two wrappers depending on BASH_VERSION. So it's checked only once, not on each invocation.

Copy link
Contributor Author

@reegnz reegnz Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about solving the wrapper issue by not calling bashlog at all during terraform shim calls unless explicitly required (eg. if you set an env variable to force debug logging, in case you need to debug the shim itself).

I'm currently limited in available cycles, so I'd welcome some collaboration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably help. I did a bit of experimenting yesterday to optimize the terraform shim codepath as much as possible.

What's the best way to collaborate? I can send a PR against your PR, or if you want to make me a collaborator on your fork, I can commit directly to the branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cspotcode I've made you collaborator on my fork.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will add a comment to this pull request to discuss what changes still need to be done.

@cspotcode
Copy link
Contributor

I just tested this. Here are some hyperfine results:

ubuntu:~/.tfenv$ hyperfine './versions/1.1.7/terraform version'
Benchmark 1: ./versions/1.1.7/terraform version
  Time (mean ± σ):      37.7 ms ±   9.4 ms    [User: 34.4 ms, System: 14.9 ms]
  Range (min … max):    27.2 ms …  69.6 ms    42 runs
 
ubuntu:~/.tfenv$ hyperfine './bin/terraform version'
Benchmark 1: ./bin/terraform version
  Time (mean ± σ):      85.3 ms ±  14.6 ms    [User: 69.1 ms, System: 27.3 ms]
  Range (min … max):    66.6 ms … 120.9 ms    30 runs

Only ~45ms of overhead is pretty good. I hope this gets merged!

@cspotcode
Copy link
Contributor

cspotcode commented Mar 8, 2022

Here is what I propose changing to maximize performance of the terraform shim, including @reegnz's idea for bashlog

  • conditionally enable bashlog.sh
    • @reegnz What environment variables do we check for this?
    • If bashlog is disabled, then declare log function as a shim.
  • Avoid all spawning of libexec executables by refactoring them into bash functions
    • The only ones used by the terraform shim are: (this list ignores commands that install new versions, which are expected to be slow)
      • tfenv itself
      • tfenv-exec
      • tfenv-version-name
      • tfenv-version-file
    • This refactor will not remove the ability to exec them
      • It will refactor their implementations into functions in lib
      • The libexec scripts will delegate to these functions.
      • The terraform shim will source and call these functions directly

@reegnz
Copy link
Contributor Author

reegnz commented Mar 8, 2022

@cspotcode I think we could use something like z BASHLOG_ENABLE=1 otherwise it should be a noop. We probably should default to no-log IF terraform is already installed and all that the wrapper is doing is selecting the installed version and exec-ing to it.
The terraform wrapper should default to disabled (so noop), unless the automatic installation happens, then we'd most definitely want to see what's going on.

@cspotcode
Copy link
Contributor

cspotcode commented Mar 8, 2022

How about TFENV_BASHLOG_ENABLE to namespace it the same way as the other environment variables?

We can declare a function enable_full_bashlog that can be called in the auto-install codepaths to switch to the full bashlog immediately before doing stuff that needs to be logged.

@cspotcode
Copy link
Contributor

Tests are passing locally but we need a maintainer to trigger the CI workflow.

@cspotcode
Copy link
Contributor

cspotcode commented Mar 9, 2022

I realized I did not explain my implementation.

I noticed that log debug ... is a no-op unless you have debugging enabled via log level. In the common case where you already have terraform installed, the only log messages in the terraform shim are debug logs.

So I changed helpers.sh to declare a function log shim that will lazy-load the full bashlog as soon as non-debug logging happens. If you have debug logging enabled, it does the right thing and loads bashlog immediately.

This avoids the need for any new environment variables, so it's not a behavioral change. Things should log in the same way they always have.

I also moved some of the libexec commands into lib as functions, which avoids a few spawns.

@reegnz
Copy link
Contributor Author

reegnz commented Mar 9, 2022

@cspotcode that is awesome!
I did some additional perf tweaking (getting rid of forking to basename and dirname in the hot path), it added another significant bump in perf:

Before tweaking:

❯ hyperfine -w 10 '~/.cache/terraform/versions/1.1.5/terraform version' './terraform version'
Benchmark 1: ~/.cache/terraform/versions/1.1.5/terraform version
  Time (mean ± σ):      36.3 ms ±   0.9 ms    [User: 31.7 ms, System: 13.5 ms]
  Range (min … max):    34.5 ms …  38.7 ms    67 runs

Benchmark 2: ./terraform version
  Time (mean ± σ):     105.8 ms ±   2.3 ms    [User: 50.8 ms, System: 36.8 ms]
  Range (min … max):   101.8 ms … 111.1 ms    25 runs

Summary
  '~/.cache/terraform/versions/1.1.5/terraform version' ran
    2.91 ± 0.10 times faster than './terraform version'

After the tweaks:

❯ hyperfine -w 10 '~/.cache/terraform/versions/1.1.5/terraform version' './terraform version'
Benchmark 1: ~/.cache/terraform/versions/1.1.5/terraform version
  Time (mean ± σ):      36.1 ms ±   1.0 ms    [User: 31.8 ms, System: 13.4 ms]
  Range (min … max):    34.0 ms …  39.8 ms    68 runs

Benchmark 2: ./terraform version
  Time (mean ± σ):      73.8 ms ±   1.5 ms    [User: 44.1 ms, System: 25.8 ms]
  Range (min … max):    70.9 ms …  77.6 ms    36 runs

Summary
  '~/.cache/terraform/versions/1.1.5/terraform version' ran
    2.05 ± 0.07 times faster than './terraform version'

This PR starts to look really good now!

For now the perf issue of the shim is solved, so it's fine for debug
logging to be of lower performance.
@reegnz
Copy link
Contributor Author

reegnz commented Mar 9, 2022

@pinkplus please have another look, I think everyone's requirements are met now with the latest code changes.

@pinkplus
Copy link

The idea of lazy loading log() is really clever. I've confirmed that it works with Bash 3 as well and is super fast. Thanks!

@cspotcode
Copy link
Contributor

Excellent, thanks for confirming!

Is there anything else Zordrak needs to merge this? Looks like automated tests run on both Mac and Linux, so I think we've covered all our bases.

@RoryKiefer
Copy link

Many thanks for tackling these issues, cspot/reegnz! ❤️ This is a fantastic performance improvement.

@cspotcode
Copy link
Contributor

@Zordrak anything else you need from us before this is merged?

@RoryKiefer
Copy link

RoryKiefer commented Apr 13, 2022

Is there anything anyone from the community can do to bolster this effort? (I may volunteer as tribute...)

@RoryKiefer
Copy link

RoryKiefer commented Apr 19, 2022

I went ahead and switched my tools test branch over to the fork, and have noticed the speed increase (and no ill effects after a week). If I encounter errors, I'll PR a fix to the fork. When TFEnv is in the critical path of an IntelliJ/JetBrains "File Watcher", it no longer keeps 10 instances of TFEnv running in parallel for several minutes while the machine fans go crazy. (MBP, Intel). In fact, no fan spinup at all. The difference is night and day and feels exactly like what my terraform wrapper felt like in this scenario before I added TFEnv to it.

Edit: I should add that my fans no longer even spin back up at all, even when Im plugged in to the left side of my machine, which is saying something. Before this branch, having IntelliJ open w/ a "file watcher" that ran TFEnv meant the fans were spinning frequently. Basically any time I ran a git operation on my terraform environment to elicit a fmt of the changed files.

@cspotcode
Copy link
Contributor

I think we need to wait for @Zordrak to merge. I wonder if there is something about this change that makes them hesitant to merge? Perhaps something that feels risky to them that we might be able to better explain to smooth out the code review process?

@cspotcode
Copy link
Contributor

Bump! Can we at least get this issue or pull request pinned? That way, anyone who hits similar performance issues will be able to find this branch and use it.

@cspotcode
Copy link
Contributor

Bump! Yet again... I am onboarding other members of my team onto some terraform projects. I would really like to get this merged. @Zordrak is there anything we can do to get this merged?

@Zordrak
Copy link
Collaborator

Zordrak commented Jul 7, 2022

Looking at this now. Apologies for the long delay and thank you very much for the massive effort here.

@Zordrak Zordrak merged commit d256f98 into tfutils:master Jul 7, 2022
@joelsdc
Copy link

joelsdc commented Jul 7, 2022

@Zordrak do you think you can make a release? This improvement is a BIG one.

@cspotcode
Copy link
Contributor

In the interim, the git installation method works well.

@reegnz reegnz deleted the bashlog_performance_improvement branch July 11, 2022 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

terraform executions delayed by ~ 1 second
6 participants